Semi-automatic document assembly with structured source data
نویسنده
چکیده
منابع مشابه
A Tool for Semi-Automatic Generation and Maintenance of Taxonomies from Semi-Structured Documents
This chapter introduces OntoExtractor, a tool for the semi-automatic generation of the taxonomy from a set of documents or data sources. The tool generates the taxonomy in a bottom-up fashion. Starting from structural analysis of the documents, it produces a set of clusters, which can be refined by a further grouping created by content analysis. Metadata describing the content of each cluster i...
متن کاملHeader Metadata Extraction from Semi-structured Documents Using Template Matching
With the recent proliferation of documents, automatic metadata extraction from document becomes an important task. In this paper, we propose a novel template matching based method for header metadata extraction form semi-structured documents stored in PDF. In our approach, templates are defined, and the document is considered as strings with format. Templates are used to guide finite state auto...
متن کاملRule-Based Information Extraction for Structured Data Acquisition using TextMarker
Information extraction is concerned with the location of specific items in (unstructured) textual documents, e.g., being applied for the acquisition of structured data. Then, the acquired data can be applied for mining methods requiring structured input data, in contrast to other text mining methods that utilize a bag-of-words approach. This paper presents a semi-automatic approach for structur...
متن کاملA survey on Automatic Text Summarization
Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...
متن کاملDocument Assembly with XML Structured Source Data
The recent evolution of electronic documents has resulted in high-level functionality in modern publishing processes. Document assembly provides a straightforward framework for such processes. In the heart of the framework lies the structure. The source documents are expected to have clear and versatile structures that facilitate complex and dynamic processing. XML as a technology enables defin...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001